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1. A PROFESSIONAL JOY 

Few authors would not be pleased when discus- 
sants implement their methods or follow-up on their 
ideas. It is therefore a professional joy to see every 
discussant doing both! Our heartfelt thanks go to all 
discussants, and to the Executive Editor, Ed George, 
for bringing us such joy! 

Incidentally, the three discussions cover nicely the 
three main parts of our paper. Zheng and Lo's dis- 
cussion centers on our motivating application, namely, 
designing follow-up strategies in genetic studies, but 
with the additional consideration of the uncertainty 
in the measures themselves. Doss's discussion fo- 
cuses on the second part of our paper, namely, the 
likelihood-based relative measure, but with applica- 
tions to survival analysis where the use of partial 
likelihood reveals very interesting (and inevitably 
confusing) complications. Chang, Chen, Chien and 
Hsing (hereafter C3H) comment on the third part 
of our paper, the Bayesian measures for small sam- 
ples, and implement variations that are applied to 
problems in infectious disease research and isotonic 
regression. 

Our responses are organized in the aforementioned 
order. We very much appreciate all the key messages 
conveyed by the discussants, though for a few of 
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them we offer alternative explanations. Some ques- 
tions posed by the discussants make nice Ph.D. or 
master thesis topics, so we summarize them at the 
end of this rejoinder. 

2. ZHENG AND LO: DESIGN WITH 
UNCERTAINTY 

Zheng and Lo further emphasize the critical role of 
measuring relative information in designing follow- 
up studies, and touch upon the issue of optimal de- 
sign under a given measure. In particular, they con- 
sider a setting with multiple variables, and suggest 
an extension of our harmonic rule (19) for combining 
multiple studies to the setting of combining multi- 
ple variables. Since our rule (19) was derived under 
the assumption that individual studies are indepen- 
dent, we surmise that Zheng and Lo's setting is un- 
der similar considerations, where variables are con- 
sidered to be independent of each other and their 
contributions to the overall log-likelihood are addi- 
tive. Otherwise we will need to consider all variables 
jointly in measuring relative information. Neverthe- 
less, it would be useful to investigate how Zheng and 
Lo's combining rule (1) performs as a quick approx- 
imation to the measure that uses the full likelihood, 
when the independence assumption fails. Zheng and 
Lo's (1) could be quite appealing to a practitioner 
who chooses to deal with multiple variables sepa- 
rately, especially for testing purposes, because of the 
technical difficulty in specifying a reliable large joint 
multivariate model. 

Zheng and Lo also correctly point out that the ac- 
tual test statistics (e.g., log-likelihood ratio) from a 
follow-up study can be quite different from what is 
predicted by our measures of relative information, 
HIi and TZIq. There are several different ways of 
investigating this uncertainty. Zheng and Lo take 
a direct approach, by simulating the actual ratio of 
complete-data log-likelihood ratio versus the observed- 
data log-likelihood ratio, which they denote by TZI y , 
as a function of the missing data. The simulations 
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are done by drawing the missing data from the con- 
ditional distribution given the observed data and 
the parameter value estimated by the observed-data 
MLE. In the binomial example, a simulation study 
is used to demonstrate that TZI^ 1 is the average of 
IZIy 1 , which itself exhibits considerable variation. 

Here we wish to point out a subtlety. Whereas 
IZI^ 1 has the nice interpretation of being the ra- 
tio of the expected complete-data lod score to the 
observed-data lod score, this expectation is calcu- 
lated under the assumption that the value of the 
parameter under the alternative hypothesis is the 
same as the one under which the (conditional) ex- 
pectation is calculated. There is no confusion about 
this assumption when the alternative hypothesis is 
sharp, that is, when it has a fixed known value. 
This is essentially what Zheng and Lo assumed, as 
they considered a number of alternative values (p = 
0.525, 0.55, 0.65) for their simulation studies. It is 
clear that under such a setting, E\lZIy l \Y \^\ 8 = 

#ob] = TH\ l , by the definition of TZI±. 

However, once we move away from this setting 
and allow the use of the actual complete-data lod 
score lod(0 co , 8q\Y co ), where 8 co is the complete-data 
MLE, then things can become much more compli- 
cated. For example, E\TZI~ l \Y Q ^8 = 8 Q \ } } = TZI^ 1 no 
longer holds because in general, 

E[lod(8 co ,8 \Y co )\Y oh] 8 = 8 oh ] 

(1) 

^E[lod(8 oh ,8 \Y co )\Y oh ;8 = 8 oh }. 

Mathematically, our key identity (13) requires both 
0\ and 02 to be fixed known constants (given the 
observed data), so one cannot take 8\ = 8 co , which 
would be a random variable, even after condition- 
ing on Y^. This technical requirement, however, 
is a reflection of a more fundamental difficulty in 
measuring (relative) information. If the additional 
data change the MLE (i.e., from o b to 8 co ), which 
can be viewed as a "center" of the likelihood, then 
measuring relative information, in terms of relative 
strength against a null hypothesis, becomes a very 
tricky task. Perhaps this is more clearly seen by 
viewing the likelihood function as an un-normalized 
posterior density, and imagining that there are two 
posterior densities. One is centered around a value 
close to 6q with a small posterior variance (i.e., the 
one based on Y co ) and the other is centered around 
a value farther away from 8q but also with larger 
spread (i.e., the one based on Y b)- It is then debat- 
able how to compare the two posteriors' respective 



strengths in discrediting the value of do i certainly it 
is a much harder task than when both posteriors are 
centered at the same location. 

With our measures we circumvent this problem by 
first calculating the log-likelihood ratio or lod score 
for the same null value 8q an d same alternative value 
8\ , given both the observed data and complete data. 
We then estimate the unknown value of 8\, or even 
#o when the null is not sharp, by the MLE under the 
alternative and null hypotheses, respectively. Alter- 
natively, as we demonstrated via the simple bino- 
mial example, when the complete-data likelihood is 
from an exponential family [which is the case for 
the binomial when p is restricted to (0, 1)], what we 
proposed was to measure how anti-conservative our 
test would be if we imputed the complete-data suffi- 
cient statistics under the alternative hypothesis and 
then pretended that they were real data (for HIi), 
or how conservative our test procedure would be if 
we imputed under the null and then pretended that 
they were real data (for TZIq). 

In that sense, the only uncertainty in our measures 
is the uncertainty caused by using the observed- 
data MLEs for 9\ and 8q. This is different from 
Zheng and Lo's simulation and variance calculation, 
which attempts to capture the conditional variation 
in Ttly 1 given the observed data. However, it is im- 
portant to point out that, because Zheng and Lo's 
setting treats the alternative value of the hypothe- 
sis as known, their variation is also different from 
the actual (conditional) variation in the ratio of the 
complete-data lod score and the observed-data lod 
score. The latter would be 

(2) Var[lod(fl co ,fl |Y co )|Y ob ,fl] 
\od\9 oh ,e \Y oh ) 

which then can be evaluated at 8 = # b, as Zheng 
and Lo suggested. Which of these variance calcula- 
tions is most relevant for practical purposes is wor- 
thy of exploring, and we thank Zheng and Lo for 
their recognition of this issue. 

It is worth reiterating here that the range of ge- 
netics/genomics applications of the proposed mea- 
sures of information is expanding with every high- 
throughput technology that is developed in this rapidly 
moving field. For example, in many applications, the 
individual genotypes on the genome are not mea- 
sured deterministically; instead, a distribution on all 
possible states is inferred from the raw data. Exam- 
ples of this include: (i) genotype calling using data 
from the new sequencing technologies such as those 
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from Solexa and Applied Biosystems, where uncer- 
tainty in calls comes from technical errors, sequence 
assembly and sequence similarity (Brockman et al. 
(2008)); (ii) imputation of genotypes for untyped 
markers using information from a reference database 
such as HapMap, where uncertainty is caused by 
imperfect prediction and by the size of the training 
data set (Nicolae (2006)); and (hi) calling genotypes 
of Copy Number Variation (CNV), where the vari- 
ability is caused by uncertainty in the boundaries of 
the CNVs and by technical variability in the probe 
measurements (Redon et al. (2006)). In all of these 
situations, instead of data yielding a genotype, G, 
the raw information is processed into a distribution 
on all possible values for G, P(G|data). These dis- 
tributions can be used, for example, in testing for 
genetic association of a disease or quantitative trait 
with the marker under investigation. The measures 
proposed in our paper can be applied directly (sim- 
ilarly to the haplotype application presented in the 
paper) to quantify the amount of information rela- 
tive to having observed the genotypes. The measures 
are important because it is possible, with additional 
laboratory work, to determine the genotypes with 
certainty. The complications arise when information 
on different markers that are in the same biological 
unit (such as a gene or a pathway) are combined 
into a single association test. This is the case where 
the discussion above is relevant and further research 
is necessary. 

3. DOSS: SO WHAT WENT WRONG WITH 
PARTIAL LIKELIHOOD? 

We very much appreciate Doss's exploration of 
applying our measures to the survival analysis set- 
ting, and were very intrigued by the problems he 
reported with Cox's partial likelihood. As we stated 
in the first section of our paper, one basic require- 
ment in measuring relative information is that we 
need to assume that the procedure under investi- 
gation is "optimal" in some sense (e.g., being full- 
likelihood based) . This requirement is needed to pre- 
vent paradoxical situations where less data can lead 
to more information, much like the "self-efficient" 
requirement in Meng (1994). A good illustration of 
such a situation is a least-square regression in which 
the variance depends on the value of the covariate. 
While the ordinary least-square estimators enjoy the 
robustness in the sense of still being consistent in 
the presence of heteroscedasticity, they are not self- 
efficient (Meng (1994)) because one can have a much 



more efficient least-square estimator with fewer data 
if the additional data happen to be those with much 
higher variances; see Meng (2001) for a detailed il- 
lustration. So Doss's finding, that TZI± may not be 
less than 1 for some of the data sets he used, re- 
minded us to look into the possibility that the par- 
tial likelihood approach may fail this basic require- 
ment. 

When "partial likelihood" is taken to mean liter- 
ally any part of a full likelihood, this failure is obvi- 
ous, because it would be trivial to construct many 
examples where the part chosen is so inefficient com- 
pared with the full likelihood that "self-efficiency" 
cannot possibly hold (even taking into account that 
"self-efficiency" is a weaker requirement than the 
usual full efficiency). So the question of real inter- 
est here is what happens in the specific case of Cox's 
partial likelihood for the proportional hazard model, 
an approach that is often considered to produce re- 
sults as good as the full likelihood method, at least 
for practical purposes. The answer to this question, 
however, is not straightforward. 

The simplest situation is when there is no cen- 
soring, in which case it is known that Cox's par- 
tial likelihood for the proportional hazard model is 
also a genuine likelihood based on part of the data, 
that is, on the ranks of all the observed failure times 
(Fleming and Harrington (1991), Chapter 4). Since 
it is a genuine likelihood, it must be self-efficient, 
and there should be no problem to apply our (16) 
or any subsequent formulas, as long as they are im- 
plemented correctly (see below). When there is cen- 
soring, the discussion in Fleming and Harrington 
(1991) shows that a further sacrifice of efficiency is 
needed in order to arrive at Cox's partial likelihood 
via the rank-data formulation. Currently we are un- 
able to determine the impact of this further sacrifice 
on self-efficiency 

What we are able to determine, or rather to de- 
tect, however, is that there is another reason that 
can explain Doss's "surprising findings," even if the 
self-efficiency issue is not relevant. The problem lies 
in how one defines observed data, and by compari- 
son, what constitutes complete data. One might find 
this is a rather odd inquiry — how hard could it be 
to determine what is observed and what is missing? 

To see why this can be a problem, let us set up 
the notation carefully. Using Doss's D notation for 
data, we distinguish three data sets: Df u u is the full 
data set that would be observed if there were no 
censoring, D ccns is the available/observed censored 
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data, and D par t is Cox's partial data, that is, the 
actual data used for calculating Cox's partial likeli- 
hood function. 

Given this setup, we can use 1ZI\ to measure the 
loss of information due to censoring by setting {Y^ = 
D cens , Y co = Dfuii}, using our generic notation; we 
believe Doss's first reported 1ZI\ value, 0.987, is for 
this purpose. We can also measure the loss of infor- 
mation from using the partial likelihood approach 
compared with the full-likelihood approach, which 
corresponds to setting {Y oh = D part , Y co = D ccns }. Doss 
does not seem to provide such a measure. We re- 
mark that we may also measure the loss of infor- 
mation of using y o b = Dp art compared with using 
Y co = Df u n, though this 1ZI\ may not be numerically 
the same as the product of the previous two because 
they assume different observed data in computing 
the MLEs and take different conditional expecta- 
tions over the missing data. 

The setting Doss provided is, however, more com- 
plicated. Imagine that we had collected additional 
samples, possibly censored. Let Dccns denote the aug- 
mented data set that includes D C e ns ; D ccns C DS • 
We then obviously can ask what is the relative infor- 
mation in Y Q \y = Dcens compared with the augmented 
sample Y co = Dccns- This is, we believe, what Doss 
intended. However, since Cox's partial likelihood is 
a very popular approach, Doss wanted to measure 
the relative information when using the partial like- 
lihood, not the full likelihood. 

Because Cox's partial likelihood uses the partial 
data Dp art , we then should set {Y^ = D part , Y C o = 
Dpart}' where Dp^ is Cox's partial data from the 
augmented sample Dcens- That is, the moment we 
decide to measure the relative information for using 
Cox's partial likelihood approach, our relative infor- 
mation is no longer about Y^b = D ccri s relative to 
Yco = Dcens , but rather about Y b = D par t relative to 



Y„ 



-Dpart' because the latter are the actual data 



sets used by the Cox regression. 

Recognizing the correct Yob and Y co directly af- 
fects how we compute, among other things, the de- 
nominator of TZI± . With y o b = Dpart and Y co = Dp" rt , 
the conditional expectation called for by the denom- 
inator of 7ZI\ of (18) in our paper should be with 
respect to 

(3) f(Y co \Y ob ; 9 oh ) = /(Depart; #ob)- 

However, the conditional distribution Doss actually 
used in his Monte Carlo simulation appears to be 



(4) f(Y co \ Y oh ; 9 oh ) = f(B a c Z \B cens ;9, 



obj 



The critical difference between (3) and (4) is in what 
is being conditioned upon, namely, D part versus D ccns . 
(The difference between Y co and Y co is less impor- 
tant here because Dp" rt is a deterministic function 
of Dccns, so if we can calculate or simulate with re- 
spect to a correctly specified conditional distribu- 
tion of Dcelis, then we can do so for any of its func- 
tions/margins.) We point out this difference because 
the use of (3) is consistent with our original defini- 
tion, as it uses the same observed data set for both 
the numerator and denominator of 72.Ii. Using (4), 
however, will result in unclear consequences. For one 
thing, our key inequality (16) is no longer guaran- 
teed to hold because the "Kullback-Leibler infor- 
mation" part would then be of the form J p\{x) x 
\og\pi{x) /po(x)]fi(dx) , which is not guaranteed to be 
nonnegative when pi(x) ^p2(x). 

Doss's explanation of his "surprising findings" is 
also based on an inconsistency, but it is the incon- 
sistency between including some censored observa- 
tions for the denominator versus only using the un- 
censored cases for the numerator. Our investigation 
above, however, reveals that the problem lies in us- 
ing the ranks of the failure times, as in D par t an d 
Dparf which is not the same as using the failure 
times themselves, as in D ccns and DS. This differ- 
ence is irrespective of censoring, because even with- 
out censoring, in which case D cens = Df u u, the crit- 
ical difference between the conditioning in (3) and 
in (4) remains. 

Intriguingly, the need for setting up notation care- 
fully is demonstrated by another more subtle differ- 
ence between (3) and (4), at least when there is no 
censoring. In both (3) and (4), we used the generic 
notation (9 b to denote an estimator of 9 based on 
the observed data. However, in the current setting, 
9 consists of both the parameter of interest, /?, and 
the (infinite-dimensional) nuisance parameter Ao, 
the baseline cumulative hazard. This recognition im- 
mediately reveals a problem for (3), because there 
is little information in D part for estimating Ao- Af- 
ter all, the most celebrated feature of Cox's partial 
likelihood is its ability to estimate /3 without having 
to deal with Ao- 

When there is no censoring, this problem also turns 
out to be the solution because /(Dp" rt |Dp art ; #) is 
actually free of Ao, a consequence of the fact that 
Cox's partial likelihood is identical to the full likeli- 
hood of j3 based on the ranks alone. One therefore 
can carry out (3) by calculating or simulating with 
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respect to /(Dp^ t |D part ; /3 = /3 ob ), where /3 ob is the 
Cox regression estimator based on D part . 

When there is censoring, the picture becomes less 
clear, because it is then possible for /(Dp^|D part ; 9) 
to depend on the baseline Ao- This is not a con- 
tradiction to the celebrated feature of Cox's partial 
likelihood, that is, its robustness to the specification 
of Ao- The relative information 1ZI\ itself may well 
depend on the actual distribution of the failure time 
when there is censoring, because the probability of 
censoring generally depends on the actual distribu- 
tion of the failure time. What this means is that 
whereas we can still define 1ZI\ theoretically as we 
did, it cannot be estimated using D part alone. This 
dilemma could be taken as a defense for using (4), 
at least for practical purposes, especially consider- 
ing the difficulties in implementing (3) even if 9 is 
known. 

However, to avoid the type of "surprising findings" 
that Doss found, we would resolve this dilemma by 
nonetheless using (3) but with the nuisance param- 
eter Ao estimated from D ccns , for instance using the 
Nelson-Aalen estimator used by Doss. That is, D cens 
enters the calculation only through the estimation 
of Ao- This dependence on D cens will not cause the 
type of problems that Doss reported, because it does 
not alter the conditioning as called for by (3) and 
because our (18) permits its numerator and denom- 
inator to depend on different parts of the same o b- 
Of course, this dependence makes uncertainty quan- 
tifications, such as those emphasized by Zheng and 
Lo, even more important, as well as more compli- 
cated, because Ao is an infinite-dimensional nuisance 
parameter. 

In a nutshell, all these complications remind us 
of the great caution we must exercise once we devi- 
ate from the full-likelihood setting. Indeed, whereas 
we recognized early the existence of an alternative 
explanation of Doss's finding, one of our initial ex- 
planations itself was a product of our lacking full ap- 
preciation of the theoretical intricacy of Cox's par- 
tial likelihood. We are certainly grateful to Doss for 
providing such a rich and intricate example, even 
though, or perhaps especially because, we were nearly 
tripped up by it! 

We also very much appreciate Doss's attempt to 
generalize our measure to the nonlikelihood setting. 
Indeed, our motivating examples, both the toy ex- 
ample with the binomial distribution and the real 
genetic applications, are for nonlikelihood types of 



testing, either with a Wald-type test in the bino- 
mial case or with non-parametric lod scores in the 
genetic setting. However, precisely for the "non-self- 
efficient" reason discussed above, it soon became 
clear to us that in order to avoid paradoxical sit- 
uations where fewer data may lead to more infor- 
mation, we need to associate a test with a model in 
order to proceed, as we did in Section 2.3. 

If we understand Doss's notation correctly, his 
TZI W can be obtained from our 1ZI± by first asso- 
ciating his tests with normal models, and hence the 
likelihood ratio test is the same as the Wald test. It is 
easy to verify that once we associate the complete- 
data test with the normal model (i.e., pretending 
the large-sample approximation is exact), the de- 
nominator of IZIi is the same as the denominator of 
Doss's 1ZI W as given in his (5). If we further asso- 
ciate the observed-data test with the normal model, 
then the numerators of 7ZI\ and 1ZI W will be the 
same, and hence 1ZI W will be identical to 7ZI\. 

An astute reader might question why we need to 
associate the normal model with the complete-data 
test and observed-data test separately. Should not 
the complete-data model automatically imply the 
observed-data model? The answer is "yes" if both 
the complete-data test and the observed-data test 
are derived from a coherent probability model (e.g., 
if both are likelihood ratio tests). However, when 
tests are derived nonparametrically, or even para- 
metrically but without following the full-likelihood 
recipe (for instance, using a partial likelihood), there 
is no guarantee that the two tests are "coherent" 
with each other in the sense that by integrating out 
the missing values in the complete-data associated 
model one would automatically obtain the observed- 
data associated model. Indeed, Doss's 1ZI W can also 
exceed 1 if the variance of the complete-data test 
statistic is larger than that of the observed-data test 
statistic, a phenomenon that can occur with an or- 
dinary least square estimator, as discussed above. A 
logical conclusion is then that even when 1ZI W seems 
to be "likelihood free," fundamentally its rational- 
ity is guaranteed only when a (normal) likelihood 
family can be associated with it. 

4. C 3 H: INFECTIOUS DISEASE STUDIES 
AND ISOTONIC REGRESSION 

We are pleased to see that C3H took on the task 
of implementing our suggested Bayesian measures in 
the context of infectious disease and regression. For 
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infectious disease, CsH's goal was to decide whether 
to invest in finding out the infectious times for the 
existing cases for which only the removal times are 
known, or in finding additional families/individuals 
whose removal times are known (but whose infec- 
tious times are unknown). This consideration is im- 
portant here because identifying the infection time 
is typically much harder (if possible at all) than 
identifying the removal time (e.g., death time). For 
the isotonic regression application, C3H considered 
the design issue: whether to add more measurements 
at the existing design points or to add new design 
points that interlace with the existing design points. 

While we are excited by these new applications, 
we are somewhat puzzled, and worried, by C^H's 
findings in both examples. For the infectious dis- 
ease example, our intuition would suggest that iden- 
tifying infection times would be more important for 
testing efficacy of vaccine than finding more individ- 
uals with only removal times known, especially when 
it is not clear (at least to us from the model descrip- 
tion given by C3H) whether "removal" here means 
death or cure (and thus possible immunity). C3H 
gave an example where the measured relative infor- 
mation in 20 households with only removal times is 
about 80% compared with the situation in which ev- 
eryone's infection time is also known. But it is only 
about 30% relative information compared with hav- 
ing four additional households with removal times 
only. This sharp difference is a surprise to us, and 
makes us wonder whether it is a reflection of issues 
with CaH's (BI3) or a defect in implementation (e.g., 
failure of an MC algorithm). 

Similarly, we are surprised to see that, in the con- 
text of testing for monotonicity of a regression func- 
tion, doubling the measurements at existing design 
points creates substantially more information than 
adding an equal amount of new design points inter- 
laced with existing design points. C3H gave an ex- 
ample where the observed data only have about 15% 
information relative to the former design, compared 
with 35% information relative to the latter design. 
This is rather counterintuitive, because for estimat- 
ing a response surface with a fixed number of mea- 
surements, it is often wise to spread out more design 
points rather than to take more measurements on 
fewer design points. For example, for the simple lin- 
ear regression y, t = f3xi + (the one that generated 
C3FFS data), the variance of the least-square estima- 
tor would be inversely proportional to S x = Yli x i'i 



for C 3 H's setting, S x = £- = o(V9) 2 = 95/27. Dou- 
bling the number of measurements at each existing 
design point clearly will double S x : S x = 190/27 = 
7.037. On the other hand, C^H's second design, if 
we understand their description correctly, is to use 
i/12, i = 1, ... ,5,7, ... , 11, as the additional 10 de- 
sign points. Under this design, S x = X^=o(V9) 2 + 
El=i(V 12 ) 2 - (6/12) 2 = 1465/216 = 6.78. So while 
the first design is indeed slightly better, the rela- 
tive variance ratio is 96%, nowhere near the 2.5- 
fold increase in information suggested by CsH's re- 
sults (0.346/0.139 = 2.5). Of course, we understand 
that C3H are measuring information in testing, not 
estimation, and their method is far more sophisti- 
cated than the simple linear regression. Neverthe- 
less, we find the 2.5-fold increase rather counterin- 
tuitive, and would be very interested in seeing it 
confirmed independently in a different way. 

C3H also touch on the intricate issue of dealing 
with nuisance parameters under the null. They sug- 
gest two ways of averaging: either averaging the nu- 
merator and denominator separately and then tak- 
ing the ratio (BI3), or directly averaging the ratio 
(BI4). Here all averaging is performed with respect 
to the posterior distribution of the nuisance param- 
eter under the null. As we discussed in Section 6.3 
(and elsewhere) of our paper, dealing with nuisance 
parameters is a complicated issue, even with the 
Bayesian approach, because we do not have reliable 
priors for them, nor do we know enough about the 
sensitivity of these measures, including CsH's, to the 
choice of priors. Therefore, understanding the the- 
oretical properties of CsH's (BI3) and (BI4) could 
be an important step toward establishing a general 
scheme for dealing with nuisance parameters in the 
context of measuring the fraction of missing infor- 
mation. 

5. POSSIBLE THESIS TOPICS 

As we concluded in our paper, much remains to 
be done, especially with small sample sizes. The 
three discussions vividly demonstrate this, and point 
clearly to a number of concrete research directions. 
Here are a few possible thesis titles inspired by the 
discussions: 

• On Optimal Follow-up Designs in Genetic Hy- 
pothesis Testing Problems. 

• Measuring Uncertainty in Relative Information 
Estimation. 
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• On Measuring Relative Information for Semipara- 
metric Models. 

• Measures of Information for Artificial Likelihoods. 

• Implementing Bayesian Relative Information Mea- 
sures for Designing Infectious Disease Studies. 

• Optimal Design Strategies for Testing Regression 
Functions Under Constraints. 

• Dealing with Nuisance Parameters in Measuring 
the Fraction of Missing Information. 

Some of these topics are middle-hanging fruits 
waiting to be picked, so if you are a thesis-topic 
seeking student reading this set of discussions in the 
reverse order, go to the first page as soon as possible! 
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